Estimation Using the Generalised Weight Share Method: The Case of Record Linkage
نویسندگان
چکیده
More and more, databases are combined using record linkage methods to increase the amount of available information. When there is no unique identifier to perform the matching, a probabilistic linkage is used. A record on the first file is linked to a record on the second file with a certain probability, and then a decision is made on whether this link is a true link or not. This process usually requires a certain amount of manual resolution that is costly in terms of time and employees. Also, this process often leads to a complex linkage. That is, the linkage between the two databases is not necessarily one-to-one, but can rather be many-to-one, one-to-many, or many-to-many. Two databases combined using record linkage can be seen as two populations linked together. We consider in this paper the problem of producing estimates for one of the populations (the target population) using a sample selected from the other one. We assume that the two populations have been linked together using probabilistic record linkage. To solve the estimation problem issued from a complex linkage between the population where the sample is selected and the target population, Lavallée (1995) suggested the use of the Generalised Weight Share Method (GWSM). This method is an extension of the Weight Share Method presented by Ernst (1989) in the context of longitudinal household surveys. The paper will first provide a brief overview of record linkage. Secondly, the GWSM will be described. Thirdly, the GWSM will be adapted to provide three different approaches that take into account linkage weights issued from record linkage. These approaches will be: (1) use all non-zero links with their respective linkage weights; (2) use all non-zero links above a given threshold; and (3) choose the links randomly using Bernoulli trials. For each of the approaches, an unbiased estimator of a total will be presented together with a variance formula. Finally, some simulation results that compare the three proposed approaches to the Classical Approach (where the GWSM is used based on links established through a decision rule) will be presented.
منابع مشابه
Sensorless Speed Control of Switched Reluctance Motor Drive Using the Binary Observer with Online Flux-Linkage Estimation
An adaptive online flux-linkage estimation method for the sensorless control of switched reluctance motor (SRM) drive is presented in this paper. Sensorless operation is achieved through a binary observer based algorithm. In order to avoid using the look up tables of motor characteristics, which makes the system, depends on motor parameters, an adaptive identification algorithm is used to estim...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملThe effect of structural changes in higher education sector on regional output (Case study: Sistan and Baluchestan Province)
Abstract The aim of this study is of the effect of structural changes in higher education on changes of output in Sistan and Baluchestan Province using structural decomposition analysis (SDA). The input-output tables of this region for the period 2006-2011 have been employed as the database of the model. The structural changes were decomposed into two factors: changes in share of specific sect...
متن کاملConditions of Non-Unique Identifiers in Record Linkage Using Japanese Cohort Dataset
The applications of unique identifiers such as name, home address and social security number to link different datasets have been commonly used and well-published. Also, the theoretical concepts of probabilistic algorithm in record linkage have been well-defined in the literature. However, few studies have reported the applications of its probabilistic algorithm using non-unique identifiers. In...
متن کاملA new method for calculating earthquake characteristics and nonlinear spectra using wavelet theory
In the present study using the wavelet theory (WT) and later the nonlinear spectrum response of the acceleration (NSRA) resulted in estimating a strong earthquake record for the structure to a degree of freedom. WT was used in order to estimate the acceleration of earthquake mapping with equal sampling method (WTESM). Therefore, at first, the acceleration recorded in an earthquake using WTESM w...
متن کامل